Goto

Collaborating Authors

 flexible model



Flexible Models for Microclustering with Application to Entity Resolution

Neural Information Processing Systems

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new class of models that can exhibit this property. We compare models within this class to two commonly used clustering models using four entity-resolution data sets.



Controllable Patching for Compute-Adaptive Surrogate Modeling of Partial Differential Equations

Mukhopadhyay, Payel, McCabe, Michael, Ohana, Ruben, Cranmer, Miles

arXiv.org Artificial Intelligence

Patch-based transformer surrogates have become increasingly effective for modeling spatiotemporal dynamics, but the fixed patch size is a major limitation for budget-conscience deployment in production. We introduce two lightweight, architecture-agnostic modules-the Convolutional Kernel Modulator (CKM) and Convolutional Stride Modulator (CSM)-that enable dynamic patch size control at inference in patch based models, without retraining or accuracy loss. Combined with a cyclic patch-size rollout, our method mitigates patch artifacts and improves long-term stability for video-like prediction tasks. Applied to a range of challenging 2D and 3D PDE benchmarks, our approach improves rollout fidelity and runtime efficiency. To our knowledge, this is the first framework to enable inference-time patch-size tunability in patch-based PDE surrogates. Its plug-and-play design makes it broadly applicable across architectures-establishing a general foundation for compute-adaptive modeling in PDE surrogate tasks.


Flexible Models for Microclustering with Application to Entity Resolution

Neural Information Processing Systems

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new class of models that can exhibit this property.


Reviews: Flexible Models for Microclustering with Application to Entity Resolution

Neural Information Processing Systems

The following are the main strengths of the paper. It points out and defines an important property of cluster sizes that existing infinitely exchangeable clustering models do not satisfy. There could be many applications, including and not limited to entity resolution, that require this property to be satisfied. It proposes a framework for defining infinitely exchangeable clustering models that satisfy this micro-clustering property, and analyzes why the DP mixture model is an unsatisfactory instance of this class. It then proposes two specific and interesting instances of this class using specific distributions for the number of clusters and cluster sizes and derives reseating algorithms for these instances.


Reviews: A flexible model for training action localization with varying levels of supervision

Neural Information Processing Systems

Paper Summary: The paper describes a method for spatio-temporal human action localization in temporally untrimmed videos based on discriminative clustering [3, 47]. The main contribution of this paper is a new action detection approach which is flexible in the sense that it can be trained with various levels and amounts of supervision. For example, the model can be trained with very weak level of supervision, i.e., train the model for action detection only using ground truth video-level action labels; and also it can be trained with full supervision i.e. with dense per frame bounding box and their class labels. Experimental results demonstrate the strengths and weaknesses for a wide range of supervisory signals such as, video level action labels, single temporal point, one GT bounding box, temporal bounds etc. The method is experimentally evaluated on the UCF-101-24 and DALY action detection datasets.


A Comparison between Neural Networks and other Statistical Techniques for Modeling the Relationship between Tobacco and Alcohol and Cancer

Neural Information Processing Systems

Epidemiological data is traditionally analyzed with very simple techniques. Flexible models, such as neural networks, have the potential to discover unanticipated features in the data. However, to be useful, flexible models must have effective control on overfit(cid:173) ting. This paper reports on a comparative study of the predictive quality of neural networks and other flexible models applied to real and artificial epidemiological data. The results suggest that there are no major unanticipated complex features in the real data, and also demonstrate that MacKay's [1995] Bayesian neural network methodology provides effective control on overfitting while retain(cid:173) ing the ability to discover complex features in the artificial data.


Understanding The Accuracy-Interpretability Trade-Off

#artificialintelligence

In today's article we discussed about the trade off between model accuracy and model interpretability in the context of Machine Learning. Less flexible models are more interpretable and thus are more suitable in the inference context where we are mostly interested in understanding the relationship between the inputs and the output. On the other hand, more flexible models are way less interpretable but the results can be more accurate. Depending on the problem we are working on, we may have to pick the model that best serves our use case. We should however have in mind that in most of the cases, we have to find the sweet spot between model accuracy and model interpretability.


The Computational Limits of Deep Learning

#artificialintelligence

The relationship between performance, model complexity, and computational requirements in deep learning is still not well understood theoretically. Nevertheless, there are important reasons to believe that deep learning is intrinsically more reliant on computing power than other techniques, in particular due to the role of overparameterization and how this scales as additional training data are used to improve performance (including, for example, classification error rate, root mean squared regression error, etc.). Classically this would lead to overfitting, but stochastic gradient-based optimization methods provide a regularizing effect due to early stopping [pillaud2018statistical, Belkin15849]111This is often called implicit regularization, since there is no explicit regularization term in the model., moving the neural networks into an interpolation regime, where the training data is fit almost exactly while still maintaining reasonable predictions on intermediate points [belkin2018overfitting, belkin2019does]. The challenge of overparameterization is that the number of deep learning parameters must grow as the number of data points grows. Since the cost of training a deep learning model scales with the product of the number of parameters with the number of data points, this implies that computational requirements grow as at least the square of the number of data points in the overparameterized setting.